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This paper describes a method for quantitatively comparing an A^-body model with a 
sample of discrete kinematic data. The comparison has two stages: (i) finding the optimum 
scaling and orientation of the model relative to the data; and (ii) calculating a goodness 
of fit, and hence assessing the plausibility of the model in vew of the data. 

The method derives from considering the data and model both as samples from some 
underlying binned distribution function, and applying probability theory arguments. 

As an example, I consider a published A^-body model for the Galactic bulge and 
disc, and fictitious /, 6, v measurements, and recover (with error estimates) the spatial and 
velocity scales of the model and the orientation of the bar. The fictitious data are actually 
derived from the model by assuming the mass scale and the solar position, but their size 
and extent mimics a recent survey of OH/IR stars. The results indicate that mass of the 
bulge and our viewing angle of the bar are usefully estimable from current surveys. 
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1. INTRODUCTION 



Kinematic surveys of a population of discrete objects are an increasingly important kind 
of data in galactic astronomy. The objects may be stars in globular clusters or elsewhere 
in the Galaxy (e.g., Mcylan & Mayor 1986), emission line objects in the Galaxy or other 
galaxies (e.g., CiarduUo et al. 1993, Hui 1993, Arnaboldi et al. 1994, Tremblay et al. 1995, 
Beaulieu 1996, Sevenster et al. 1997a,b), or galaxies in a cluster (e.g., CoUess & Dunn 
1996). Such surveys usually measure sky positions and line-of-sight velocities, but for 
some systems proper motions are also available (e.g., Spaenhauer et al. 1992). 

One would like to be able to throw these data at some dynamical analysis machine 
and reap all the dynamical results implicit in the data, but there is no such machine. 
Some progress towards this goal has been made, notably by D. Merritt and collaborators 
(see Merritt 1993, Merritt & Tremblay 1993, Merritt & Gebhardt 1994, and especially 
Merritt 1996). These papers develop methods for reconstructing mass profiles (including 
dark matter) from kinematical observations, in a model independent way. But at present 
they extend only to axisymmetric systems viewed in the equatorial plane. So for triaxial 
systems, and certainly for non-equilibrium systems like clusters of galaxies, it is basically 
A^-body simulations that have to be confronted with data. How can we best do this 
quantitatively? 

Generally speaking, there are three questions one would like answers to when com- 
paring A^-body models with observations. 

(i) How should a model be scaled and oriented to best fit the data? 

(ii) Could the data at hand have plausibly come from a particular model's distribution, 
or do the data rule out the model? 

(iii) If there are several plausible models, which one do the data favor? 

All three are answerable if we can calculate the likelihood function, which is the probability 
of having gathered the actual data under a particular model. Suppose for definiteness that 
the data consist of measurements of sky position and line-of-sight velocity v, with 
negligible errors. Let us also assume for now that the simulation is so fine grained that it 
effectively gives us a distribution function /. One usually thinks of / as a function of phase 
space variables, but we can change variables to express it as f{l,b,v,r]), where ry stands 
for three unmeasured numbers (e.g., distance and proper motion). Then the probability 
of drawing values lk,bk, from / is 



Assuming the data on different objects are independent, we have for the likelihood: 



Since /(/, 6, f , rf) will depend on the scalings and orientation adopted, we can fit for these 
parameters — the peak of prob (data | /) in the relevant parameter space estimates the pa- 
rameters and the broadness of that peak gives uncertainties. To answer question (ii), we 




(1) 
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can test if the value of prob (data| /) is typical of random data sets drawn for that /; if 
prob (data | /) is anomalously low, we can infer that / is inconsistent with the data. Ques- 
tion (iii) can be answered by comparing prob (data | /) for the various models available; 
there is an extra complication though, in that we must marginalize over the parameters 
for each model — see Sivia (1996) for a discussion of this point. I will not address model 
comparison in this paper. 

The contribution of this paper is to derive and test a practical approximation to the 

'in-principle' procedure above. We need an approximation because particle simulations 
do not give us / directly; we need to smooth somehow. Smoothings in general introduce 
biases, so we have to monitor for biases and correct for them if necessary. But bearing that 
caution in mind, the smoothing I propose to use is the simple-minded one of just binning in 
I, b, V, i.e., assuming that / is constant within boxes in I, b, v space. Let us say that for some 
choice of scaling and orientation parameters, the i-th. bin has rrii model points and Si data 
object points; also let M = '^^Trii and S = Si. This immediately suggests minimizing 
to obtain a best fit, but that is a bad idea. Minimizing implicitly assumes that the 
Si follow a Gaussian distribution, the mean and variance in this case being both equal to 
rriiS/M. This is fine if all the Sj » 1, but S being typically dozens to hundreds we do not 
have such luxury. Moreover, for bin sizes of interest, even the rrii may not always be large 
enough for shot noise to be negligible. The solution is to view both the sets rrii and Si as 
samples drawn from some underlying / that is constant within bins. The likelihood then 
takes the form 

prob (data | model) oc = ff ^"^^l^/^' . (3) 

rriilsil 

1=1 

This formula is derived in the Appendix, but note two intuitively desirable properties of 
W: (i) the {rrii + -§«)! factor favors large rrii coinciding with large Si, but the denominator 
discourages extremes like rrii = M at the bin with highest Si and elsewhere; and (ii) if 
some outlier observation lands in a bin with no model points (i.e., rrii — 0,Si = 1), that 
bin contributes unity to the product — in this sense W is robust against outliers. 

Although the formula (3) is symmetric in rrii and s^, operationally these two sets of 
numbers will play quite different roles. The Si derive from data and, for a given data 
set and binning, they are fixed. The rrii, on the other hand, depend on the scaling and 
orientation parameters and will vary as the those parameters are adjusted to maximize W. 

To explain the details of the use of W it is probably best to work through an example, 
and below we work through the problem of scaling and orienting A^-body models of the 
Milky Way bulge and inner disc from /, 6, v measurements. As it happened, it was this 
problem that led to the present work, but the bulge is a good example to illustrate anyway, 
for two reasons. Firstly, it is a triaxial system with the interesting complication that its 
depth is not negligible compared to its distance. Secondly, there are several both of data 
sets (te Lintel Hekkert et al. 1991, Beaulieu 1996, Sevenster et al. 1997a,b) and models 
(Sellwood 1993, Zhao 1996, Fux 1997) in the recent literature. 
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2. EXAMPLE: SELLWOOD'S GALACTIC MODEL 

Figure 1 shows an A^-body model of the Galaxy by Sellwood (1993). Note the bar in the 
bulge, which makes the bulge triaxial. The real Galactic bulge is now generally agreed to 
have a substantial bar (oriented such that the side nearer to us is receding); see Gerhard 
(1996) for a review. Hence the interest of comparing models such as Sellwood's with 
kinematic data. 
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Figure 1. Spatial positions of 43802 particles in Sellwood's (1993) model for the Galaxy. As 
the spiral arms in the upper panel suggest, the model rotates counterclockwise, so positive z 
corresponds to the South Galactic direction. The mock l,b,v data are generated by viewing the 
particles from a 7 o'clock position on the (x, y) plane at radius of 6. 
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Let us put ourselves at the point {—Rq sin (p, Rq cos ip, 0) in the A^-body model and 
look towards the bulge. To evaluate the observed quantities from this location, we rotate 
and scale the model thus: 



and then compute 



= X cos (p — y sm (p 

= X sin (p + y cos (p + Rq 

= z 

= Vsc&\eivx COS (p-Vy sin (p) - Vq 

= ^'scale(^'x sin ip + Vy COS (p) 
— ^scale^z 

r — X + y + z 

I = arctan(y', x') 

b = arcsin(2;'/'?'') 

V = {x'vx' + y'vy' + z'vg')/r'. 



(4) 



(5) 



Here Rq is our Galactocentric radius in model units (Sellwood suggests i? ~ 6), t'scaie is the 
scaling factor between real and model velocity units, vq is our tangential velocity in real 
units, <p is the viewing angle of the bar, and our radial velocity is assumed or corrected 
for. In the convention implied by equations (4) and (5), (p between and 90° means that 
the nearer side of the bar is at positive I and (because the model has positive rotation) 
positive V. The real Galactic bar is believed to be in such an orientation. 

The asymmetry between the spatial and velocity parts of equation (4) may seem 
odd — why not 

x' = rscaie{x COS (p - y sin (p) 



y 



rscc.ie{x sin (p + y cos ip) + Rq 



(6) 



^ — ^scale-^ • 



? 



We would need an extra parameter rgcaie if we were considering proper motion data (avail- 
able for Baade's window stars in Spaenhauer et al. 1992). But from equation (5) the 
observables all depend only on the ratio Ro/f scales so in equation (4) we drop Tgcaie- 

Then 

Rq in effect becomes a surrogate for the spatial scale: if our true Galactocentric distance 
is 8.5 kpc then 

'8.5\ / kpc 
Rq J \ model unit 



''scale 



With proper motion data, in principle rgcaie and Rq could both be determined, thus provid- 
ing the actual Galactocentric distance; but with only /, 6, v that distance must be supplied 
separately to get rgcaie- Once we have rgcaie: we can get the mass because the scale for 
G X mass is rgcaie x '^^scaie (sincc GM has dimensions of L^T~^). 

Consider now a survey of l,b,v measurements, which we would like to compare with 
the simulation and infer Rq, (p, Vsc&ie, and vq to the extent possible. The first step is 
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to choose the bins in l,b,v for comparison — more on choosing bins below, but for the 
moment suppose we have chosen our B bins. This sets the s^. The will depend on 
what scaling and orientation parameters we choose; for any choice we can put the model 
particles through the transformations (4) and (5), bin them up, and randomly pick M out 
of all the particles that fall into our B bins, thus getting the m^. Then we calculate W, 
which clearly depends on the parameters.^ Clearly, our strategy to estimate the parameters 
will be to vary them so as to maximize W. Getting error bars and testing the model are a 
little more involved, and discussed in detail in Section 3. They will involve simulating data 
sets from the model. Generating a simulated data set Si from the model is like generating 
rrii, except that we choose S particles rather than M. Sometimes we will be calculating 
W for two sets of occupancies Sj and m^, both of which come from the model, but using 
different parameter values.^ 

How to choose the bins? Because of the assumptions that go into the derivation of W, 
it is best to avoid unequally sized bins. But there is no need to have bins in unsurveyed 
regions, so bins need not be contiguous. The bin size requires some thought. I cannot 
suggest any definite prescription for the bin size, but there are two guidelines. Firstly, B 
should be several times smaller than M, so that the rrii can be large enough to actually 
carry some information about the distribution function; B ~ M/5 seems serviceable. 
There is no problem with B ^ S; after all, the continuous limit is M ^ B ^ S. Secondly, 
the binning should not be so coarse that is misses important features in the distribution 
function. Too coarse a binning can lead to strange biases, as the following suggests. The 
scale height of the bulge (as seen from the solar system) is about 2.2°; suppose the bins 
were 5° in b. When fed data binned thus, any model fitting procedure is likely to respond 
by fitting a model with an increased scale in b. An easy way to do this is to increase 
-Ro — but then the fit would have to compensate for the scale in /, which it might do by 
reducing to make the bar more nearly end-on; but a nearly end-on bar will tend to give 
larger v values, and this in turn might be compensated by reducing Vgcaie- 



^ The number of model particles that fall into our B bins will depend on the parameters. Particles 
may fall outside the survey region where we might have no bins. But M must be kept the same for all 
parameter values, i.e., we must always choose a subset of size M of those model particles that do fall into 
our B bins. Otherwise the formula (3) for W becomes invalid (see the Appendix). 

^ If we are going to compare mock data generated from a model with that model, it is important to 

then remove from the model those points which went into the mock data. A model particle should never 
contribute to both Si and mj at the same time. The reason is that the data are not supposed to correspond 
exactly to any model particles, only to have come from the same distribution function. 
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3. USE OF THE W FUNCTION 

It is straightforward to incorporate the hkehhood W in standard Monte-Carlo procedures 
for parameter estimation and model testing, and the following describes how this can be 
done. The approach here is not the only possible one, and Bayesian purists would reject 
it entirely; but it seems computationally the most tractable. 

It is helpful to consider two functions, D and Q. 

D{uj) is a function that generates a data set from the model using parameter values 
u). D is probabilistic, so there can be many possible data sets Di, D2, . . . (u) from the 
same model and parameters. If the model is correct and the true values are cjtrue, then 
the observed data can be thought of as one realization of D: 

-Dobs = -D(a;true)- (7) 

W depends on both the data and the parameters: 

W = W{D, uj') or : W{D(uj), u'). (8) 

In general uj' ^ uj; uj leads to the data and hence to the Si, but the rrii are got by applying 
a possibly different value uj' to the model. We now define the function 0, thus: 

Q(D) = u': W{D,u') is maximum. (9) 

To calculate ^(-D), we don't need to know what uj value gave D; but Vt{D) is an esti- 
mator for that unknown value (in fact, a maximum likelihood estimator, because VF is a 
likelihood).^ 

To estimate parameters we calculate cjest = ^^(-Dobs)- For error bars on c^est we want 
the scatter in f2(Z)(a;true))- But since in real applications we won't know a;true) we can 
take instead the scatter in 0(D(a;est)); from this we can read off desired confidence limits. 
This is standard Monte-Carlo error estimation — see Figures 15.6.1 and 15.6.2 in Numerical 
Recipes by Press et al. (1992). 

Testing the model is a little more complicated. We need some statistic that measures 
the goodness of a parameter fit, but the well known ones do not help us: is inappropriate 
for the reasons given in Section 1, and KS and its relatives are inapplicable because the 
data are not one-dimensional. However, there is an obvious choice of statistic: W itself. 
To apply it, we compare VF(-Dobs, "^est) with the distribution of W {D {uj true) est)- If 
VF(Dobs5 "^est) lies in the lowest percentile of the distribution of W{D{uJirne)-,^est)-, then 
the model is rejected at 99% significance, and so on. For and also for KS and its 



^ If we had 

{n{D{uj)))=uj, 

(the average being over an ensemble of L)„ with lo held fixed) then O would be an unbiased estimator. 
As suggested in Section 1, in practice Q. will have some bias, because for one thing the binning process 
introduces bias. But that is not a problem provided we can test for bias and correct for it where required. 
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relatives, the distribution corresponding to W{D{ujtrue),'^est) is model independent. In 
our case we will need to calculate the distribution; but again, we won't know a;egt, so we 
will have to substitute the distribution of W{D{uJest)j "^est)- 

Note that if the goodness of fit test leads to rejection of the model, parameter estimates 
from that model must be rejected too (even if the error bars claim high accuracy). 

The main algorithmic problem is locating the maximum of W in the multi-dimensional 
parameter space of uj. My implementation basically follows Charbonneau (1995). Pro- 
grams are available to anyone interested. 



4. SIMULATIONS WITH SELLWOOD'S MODEL 

In this section I present some Monte-Carlo simulations to gain some idea of which bulge 
parameters can be constrained from current surveys and how well. 

Consider Sellwood's model with o^true being Rq = 6 model units, ip = 30°, vq = 
220 km/sec, and f scale = 300 km/sec/model unit. All these are plausible values, and using 
them I computed fi(D(a;true)) for 32 mock surveys, each having 300 objects in the range 
\l\ < 10.5°, \b\ < 3.25°, \v\ < 320 km/sec. The size and extent mimics the symmetric part 
of the survey of bulge OH/IR stars by Sevenster et al. (1997a,b). (The survey region need 
not be symmetric — see below.) I used 21 x 13 x 16 bins in /, 6, v. Figure 2 shows the results: 
the ranges of the axes in i?o, t'o, and ip are the ranges I searched; for Vgcaie I searched the 
range 0-500. 

Figure 3 shows a sort of direct visual data- model comparison for one (the first) of 
the mock surveys. Such plots are useful in checking for programming goofs, but otherwise 
there is not much information one can extract from them. 

Figure 4 illustrates that the maximum of W of the first of the mock surveys is typical 
of the values should expect from this model, as expected since the mock survey came from 
the model. With real data such a plot would test whether the data could plausibly have 
come from the model being studied. 

From the results in Figure 2, the medians and 68% range in f2(D(a;true)) are = 
(29ly)°, Vscaie = 290]']33. These numbers indicate the sort of bias and error bars we can 
expect from a survey of this size and extent. We see that fscaie can be estimated to ~ 10% 
and (f to ^ 10° with no need to correct for bias. On the other hand, we get no useful 
information on vq and Rq. That vq is not constrained by such data is not surprising, since 
it almost perpendicular to what is measured. But the inability to infer Rq is puzzling, 
especially considering the impressively tight constraint on (p. Evidently, we must use 
integrated light to estimate Rq. 

It is interesting to see what happens as the surveys get bigger. Figure 5 shows 
Q{D{u>true)) for mock surveys when the size is extended to 500 and the I range to 
—45° < I < 10.5°, which mimics the full size and extent of Sevenster et aZ.'s (1997a,b) 
OH/IR survey. We now find approximate medians and 68% ranges of <^ = (29^.1)° and 
■i'scaie = 290^if , but the outliers are noticeably less distant. We also being to notice a bias 
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Figure 2. Recovery of parameter values from mock surveys of 300 objects in the range \l\ < 
10.5°, |6| < 3.25°. The filled circles show the parameter values used to generate the mock data, 
i.e., Wtrue- The open circles show the parameters recovered from 32 different mock surveys, i.e., 
Q{Dq. .D^i^LOtiue))- The open circles with three cuts inside refer to the mock survey £>o('^true) 
that happened to be first; this Dq is used again for Figures 3 and 4. 



towards low estimates for t'scaie and high estimates for Rq; if the survey size is increased 
further, the scatter in 0(D(a;true)) reduces further, and the biases in fgcaie arid Rq become 
correspondingly more noticeable. 

To summarize the results, these survey simulations indicate that current surveys can 
constrain the viewing angle of bulge simulations to < 10° and the velocity scale to < 
10% (at 68% confidence). The spatial scales will need to be set independently using 
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Figure 3. The open circles show l,b and l,v for a mock survey -Do('^true)- The filled circles 
show I, b and v for a mock survey Di(ujest), where Uest = ^iDo{uJtrue))- With a real survey one 
would use -Dobs instead of -Do(^^true)- 



integrated light. Kalnajs (personal communication) obtains ~ 10% or better constraints 
on Rq by comparing A^-body models and integrated light from COBE. Combining with 
< 10% uncertainties on Vsc&ie, it appears that AT-body models could be scaled in mass to 
~ 25%. The resulting predictions for microlensing optical depths would easily be tight 
enough for interesting confrontations with bulge microlensing observations. 



I am grateful to Ken Freeman for posing this problem and to Sylvie Beaulieu and Maartje 
Sevenster for teaching me about bulge observations. Thanks also to Agris Kalnajs and 



10 Comparing discrete kinematics and simulations 




InW 

Figure 4. The vertical dashed hne in this figure shows the value of In W^(Do(i^true), i^est) where 
u;est = ^^(-Do(i^true)) which is marked in Figure 2. With a real survey -Do('^true) would be 
replaced by -Dobs- The rising curve is the cumulative distribution of lnW(Di. .-Dioo('*^true)5 'i'est)- 
As expected, -Do(a;true) appears as typical of the distribution Dn(uJtTue)- 



Maartje Sevenster for the appropriate mixture of enthusiasm and skepticism about W. 
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Figure 5. Similar to Figure 2, but this time the mock surveys have 500 objects each in the 
range -45° < I < 10.5°, |fe| < 3.25°. 
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APPENDIX 

This appendix derives the hkehhood formula (3) from probabihty theory arguments. 

Continuing the notation of Section 1, we suppose that there are B bins in all, and 
that in the i-th bin f = fi, and both and Sj are drawn from /j. 

The joint probability for the bin occupancies is the product of two multinomial dis- 
tributions: 

prob {si, rrii \ /,) = M\S\T\ (Al) 

Equation (Al) could be used to estimate the fi, with uncertainties. But as we have no 
particular interest in the /, as such, we marginalize them out in the usual way, which is 
to integrate over all allowed values of the fi — that means all combinations of values of the 
fi between and 1, subject to /i = 1. Distributions of fi that most resemble the rui 
and Si will, according to (Al), contribute most to the integral. Marginalization is just an 
application of the additive rule for probabilities. Using the identity 

n / /."' *) * (e, - 1) = (jvT^ n (^2) 

we get 

Ml SI {B- 1)1 ^ {m, + s,)\ 

P^"^ "^^^ = (M + s + B-iy. n (A3) 

Putting = in (A3) gives prob (m^), and hence 

If S = 1 then the probabilities in (A3) and (A4) become unity, as they should. For the 
sort of applications of interest in this paper M, S, and B are fixed, so we can ignore the 
normalization in (A4) and work only about W as defined in equation (3). 

If M is large enough compared to S that rn^ + Sj ~ mj and (A4) simplifies to 

prob(si|mj) oc (A5) 

. Si. 

which amounts to saying that fi = rui/M because the shot noise in the rui is negligible. 
If M is large enough that we can make the bins so small that each = or 1, but still all 
mj ^ 1, then (A5) simplifies further, to 

prob (sj I mj) oc JJ^ m^, (A6) 

which amounts to the formula (2) for the continuous case. Thus, (A4) has the expected 
large-mj limits. 



